Language model acquisition from a text corpus for speech understanding
نویسندگان
چکیده
Speech understanding can be viewed as a problem of translating input natural language of speech recognition results into output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates co-occurrence probabilities of input and output grammar rules as a translation language model. Since the amount of texts is limited, estimating a reliable language model is difficult. Therefore, we propose a method of concisely modeling input and output grammars in order to estimate a reliable translation model. Our method is shown to be effective by experiments using the ARPA ATIS task.
منابع مشابه
Multipass algorithm for acquisition of salient acoustic morphemes
We are interested in spoken language understanding within the domain of automated telecommunication services. Our current methodology involves training statistical language models from large annotated corpora for recognition and understanding. Since the transcribing of large speech corpora is a resource consuming task, we are motivated to exploit speech without transcriptions. In particular, we...
متن کاملFirst steps in building a large vocabulary continuous speech recognition system for Vietnamese
This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملVariation Sets Facilitate Artificial Language Learning
Variation set structure — partial alignment of successive utterances in child-directed speech — has been shown to correlate with progress in the acquisition of syntax by children. The present study demonstrates that arranging a certain proportion of utterances in a training corpus in variation sets facilitates word segmentation and phrase structure learning in miniature artificial languages by ...
متن کاملIntegrated Recognition and Interpretation of Speech for a Construction Task Domain
The development of speech processing front-ends for the controlling of complex systems has received more and more interest during the last years. Usually this task is divided in two subtasks. The speech recogniser records the utterance and puts out a corresponding text, and the speech understanding module tries to extract an internal representation of the meaning of the utterance. As shown in F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996